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1.  Introduction 

People  use  and  extend  their  knowledge  of  the  physical  world  constantly.  Understanding 
how  this  fluency  is  achieved  would  be  an  important  milestone  in  understanding  human  learning 
and  intelligence,  as  well  as  a  useful  guide  for  constructing  machines  that  learn.  Our  purpose  is  to 
construct  a  computational  account  of  human  experiential  learning  in  physical  domains. 

We  are  still  at  the  stage  of  refining  the  questions  rather  than  providing  detailed  answers.  In 
many  cases,  there  is  no  direct  evidence  for  our  claims.  In  other  instances,  support  for  the  theory 
is  obtained  by  combining  evidence  from  several  different  areas,  including  developmental 
psychology,  studies  of  learning,  and  other  psychological  research.  No  one  of  these  is  adequate  by 
itself.  When  extrapolating  from  adult  learning  research,  we  must  keep  in  mind  that  cases  of  pure 
experiential  learning  are  rare  in  adult  life;  some  sort  of  instruction  or  prior  expectation  is 
typically  involved.  Developmental  research  provides  a  good  source  of  data,  since  much  of  young 
children’s  learning  is  truly  from  direct  experience.  Yet  when  developmental  results  are  applied  it 
must  be  remembered  that  children  are  not  only  learning,  but  also  maturing.  Therefore,  in  order 
to  isolate  and  study  experiential  learning,  the  existing  empirical  findings  must  be  examined, 
filtered,  and  carefully  fitted  together.  Although  space  does  not  permit  detailing  all  the  relevant 
lines  of  evidence,  we  will  try  to  give  the  reader  some  justification  for  our  claims  whenever 
possible. 

The  past  few  years  has  seen  significant  progress  in  machine  learning.  However,  to  construct 
programs  that  learn  as  well  as  (or  better  than)  people  do,  it  is  important  to  understand  how 
human  learning  works.  Ultimately  both  psychological  studies  and  direct  computational 
experiments  (i.e.,  constructing  programs)  will  be  necessary  to  provide  a  full  account.  To  this 
end,  we  will  try  when  possible  to  indicate  how  techniques  developed  in  machine  learning  might  be 
used  to  implement  such  programs. 


1.1.  Overview 

A  brief  prolog  may  help  to  organize  the  material.  Three  key  ideas  underlie  the  theory:  (I) 
the  centrality  of  physical  processes  in  mental  models  of  science;  (2)  the  importance  of  analogy  in 
learning;  and  (3)  the  primacy  of  rich,  contextually  specific  representations.  The  idea  that  the 
notion  of  process  is  central  to  human  knowledge  about  physical  domains  is  the  chief  tenent  of 
Qualitative  Process  (QP)  theory  (Forbus  1981;  Forbus,  1984).  This  is  not  to  say  that  notions  of 
process  are  there  from  the  beginning.  Rather,  we  hypothesize  that  a  person’s  experiential 
knowledge  of  a  domain  begins  as  a  collection  of  scenarios  that  describe  particular  phenomena, 
out  of  which  is  developed  a  vocabulary  of  processes  that  provide  a  notion  of  mechanism  for  the 
domain.  The  second  key  idea  concerns  the  role  of  comparisons  among  related  knowledge 
structures.  We  conjecture  that  much  of  experiential  learning  proceeds  through  spontaneous 
comparisons  -  which  may  be  implicit  or  explicit  -  between  a  current  scenario  and  prior  similar  or 
analogous  scenarios  that  the  learner  has  stored  in  memory.  Structure-mapping  theory  (see 
Centner  1980;  Centner,  1983)  describes  these  kinds  of  comparisons. 

The  third  idea  is  a  rather  paradoxical  claim:  in  human  processing,  more  is  often  easier. 
Rich,  perceptually  based  representations  are  acquired  earlier  in  learning  than  sparse  abstract 
representations.  That  is,  early  domain  representations  differ  from  more  advanced 
representations  of  the  same  domain  in  containing  more  information,  especially  perceptual 

'll.  should  lie  noted  that  psychologists  by  no  means  generally  agree  with  this  claim  Consequently,  we  will  try  to 
lie  Curly  explicit  hi  presenting  evidence  for  this  position 
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information  specific  to  the  initial  context  of  use  and  acquisition.  A  second  aspect  of  the  “more  is 
easier”  claim  concerns  comparisons:  we  suggest  that,  for  humans,  similarity  comparisons  are 
easier  when  there  is  more  overlap  between  the  two  knowledge  structures  being  compared. 

On  the  basis  of  these  three  ideas,  we  propose  a  canonical  learning  sequence.  The  claim  is 
that  human  experiential  learning  of  physical  domains  can  be  viewed  as  a  sequence  of  different 
mental  models:  (I)  protohistories,  (2)  the  causal  corpus,  (3)  naive  physics,  arid  (4)  expert  models. 
Briefly,  protohistories  are  rich,  contextually  specific,  highly  perceptual  representations  of 
phenomena,  capturing  expectations  about  typical  phenomenological  patterns  -  for  example,  “If  I 
turn  the  key,  the  car  will  start.”  With  the  causal  corpus,  the  expectation  of  mechanism  enters; 
here  the  representation  consists  of  simple  statements  that  some  sort  of  causal  connection  exists 
between  variables  -  “If  the  car  has  no  gas,  it  will  not  start.”  In  the  naive  physics  stage,  processes 
are  introduced  to  provide  the  mechanism  underlying  the  causal  corpus  -  “Gas  must  flow  from  the 
tank  to  the  carburator  and  mix  with  air  so  that  the  mixture  can  be  ignited  by  the  spark.”  The 
disparate  local  connections  of  the  causal  corpus  arc  replaced  with  qualitative  models  organized 
around  the  notion  of  process.  Finally,  in  the  expert  models  stage,  quantitative  representations 
are  created  -  for  example,  models  of  the  effects  of  different  mixtures  of  oxygen  and  gasoline. 

In  this  paper  we  discuss  our  conjectures  about  these  models  and  how  a  learner  constructs 
one  type  of  model  from  another.  First,  however,  the  component  theories  that  underlie  this 
framework  are  briefly  summarized:  Qualitative  Process  theory,  which  provides  concepts  needed 
to  represent  the  models  (particularly  in  the  naive  physics  stage);  and  structure-mapping  theory, 
which  characterizes  the  kinds  of  computations  that  move  the  learner  from  one  representation  to 
another.  Then  the  overall  role  of  structure-mapping  comparisons  is  examined  in  the  progression 
from  rich  to  sparse  representations.  With  these  foundations  in  place,  the  four  stages  of  learning 
for  physical  domains  we  postulate  are  then  described. 


2.  Qualitative  Process  Theory 

The  first  requirement  is  a  language  in  which  to  describe  people’s  common  sense  knowledge 
about  physical  situations.  People  know  about  a  great  many  kinds  of  physical  changes:  things 
move,  collide,  bend,  break,  heat  up,  cool  down,  How  and  boil.  Intuitively  we  think  of  these  as 
processes.  Qualitative  Process  theory  attempts  to  formalize  this  notion  of  process  to  provide  a 
common  form  for  qualitative  theories  of  dynamics.  As  will  be  clear  later  on,  we  do  not  believe 
that  the  first  models  people  construct  of  a  domain  take  the  form  of  processes,  nor  even  that 
people  become  knowledgeable  enough  to  construct  these  models  for  every  domain  they 
experience.  Nevertheless,  some  of  the  concepts  of  QP  theory  will  be  useful  for  describing  models 
in  other  stages  as  well. 

In  QP  theory,  a  physical  situation  is  modelled  as  a  collection  of  objects  and  relationships 
among  them,  with  processes  responsible  for  causing  changes.  The  continuous  parameters  of  an 
object,  such  as  temperature  and  pressure,  are  represented  by  quantities.  A  quantity  has  two 
parts,  an  amount  and  a  derivative.  .Amounts  and  derivatives  are  both  numbers.  The  model  to 
keep  in  mind  for  numbers  is  that  of  the  reals,  but  it  is  important  to  note  that  in  QP  theory 
particular  numerical  values  are  never  used.  Instead,  the  value  of  a  number  is  described  in  terms 
ol  its  quantity  spare  a  collection  of  inequalities  that  hold  between  it  and  other  quantities.  Figure 
I  illustrates  a  quantity  space  for  the  level  of  liquid  in  a  container.  The  quantity  space  is  a  useful 
qualitative  representation  because  processes  typically  start  and  stop  when  inequalities  between 
parameters  change. 
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r  igure  1  -  A  quantity  space 

A  quantity  space  describes  the  value  of  a  number  by  the  inequality  relationships  that  hold 
between  it  and  other  numbers.  An  arrow  indicates  that  the  number  at  its  head  is  greater  than 
the  number  nr  i*s  tail.  Thus  LEVEL (wa)  is  less  than  LEVEL (wb)  and  greater  than  BOTTOM(a), 
while  LEVEL, cb)  and  TOP  (a)  are  unordered. 


BOTTOM  (  a  ) — ►LEVEL  (  wa  )  — ►TOP  (  a  ) 

^^LEVEL  (  wb  ) 


Figure  2  illustrates  a  typical  process,  called  LIQUID-FLOW.  A  process  has  five  parts: 
individuals,  preconditions,  quantity  conditions,  relations,  and  influences.  Roughly  speaking,  the 
individuals  part  describes  where  instances  of  a  process  might  occur,  the  preconditions  and 
quantity  conditions  tell  when  it  will  be  acting,  and  the  relations  and  influences  describe  what 
holds  a„  a  consequence  of  it  acting.  In  more  detail,  for  any  collection  of  objects  that  matches  the 
individual  specifications  there  is  a  process  instance  which  represents  the  potential  for  that  process 
to  occur  between  those  individuals  in  a  particular  way.  For  example,  there  will  be  two  instances 
of  LIQUID-FLOW  between  the  liquid  in  the  containers  of  figure  1,  each  corresponding  to  flow  in  a 
particular  direction. 

A  process  instance  is  active  whenever  both  its  preconditions  and  its  quantity  conditions  are 
i  rue.  The  distinction  between  preconditions  and  quantity  conditions  is  that  quantity  conditions 
ran  be  determined  within  QP  theory  but  preconditions  cannot.  Quantity  conditions  concern 
what  inequalities  hold  and  what  other  processes  (or  individual  views,  which  are  introduced  below) 
are  active.  Preconditions  concern  any  relevant  factors  other  than  quantity  conditions,  such  as 
spatial  boundaries.  For  example,  in  “traditional”  physics  we  can  solve  equations  to  figure  out 
how  fast  a  ball  will  be  moving  when  it  hits  the  floor,  but  the  equations  will  not  tell  us  a  priori 
where  the  floor  is.  (Jr,  returning  to  the  present  example,  if  we  know  that  all  the  valves  in  the 
fluid  path  between  the  two  containers  are  open  (i.e.,  the  fluid  path  is  aligned)  then  fluid  will  flow, 
fin'  we  cannot  predict  within  QP  theory  when  or  if  someone  will  walk  by  and  turn  oil  a  valve. 
I terause  these  fact  ors  still  ailed  dynamical  conclusions,  preconditions  must  be  explicit  I  v 
represented. 
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Figure  2  -  A  typical  process 

This  process  specification  describes  a  simple  kind  of  liquid  flow.  It  can  occur  between  two  con¬ 
tained  liquids  that  are  connected  by  a  fluid  path,  whenever  the  path  is  aligned  -  that  is,  all  valves 
in  the  path  are  open  -  and  the  pressure  in  the  one  taken  as  source  is  greater  than  the  pressure  in 
the  contained  liquid  taken  as  destination.  The  quantity  type  AMOUNT-OF  represents  how  much 
“stuff”  there  is  in  an  object.  The  function  A  maps  a  quantity  into  the  number  which  is  its 
amount,  a  number,  as  opposed  to  AMOUNT-OF,  which  is  a  function  that  maps  a  piece  of  stuff  into 
a  quantity. 

Process  LIQUID-FLOW 
Individuals : 

source,  a  CONTAINED-LIQUID 
dest,  a CONTAINED-LIQUID 

path,  a  FLUID-PATH,  FLUID-CONNECTION (source ,  dest,  path) 

Preconditions : 

ALIGNED (path) 

Quantity  Conditions: 

A  [PRESSURE  (source)]  >  A  '  :  SURE  (dest)  ] 


Relations : 

Let  flow-rate,  dlff  be  quantities 
diff  =  PRESSURE (source)  -  PRESSURE (dest) 


flow-rate 


diff 


Influences : 

1+ (AMOUNT-OF (dest) ,  A [flow-rate] ) 

I- (AMOUNT-OF (source) ,  A [flow-rate] ) 


Whenever  a  process  instance  is  active,  its  influences  and  relations  hold.  The  influences 
component  of  a  process  specifies  its  direct  effects;  the  relations  component  describes  other  facts 
l  hal.  are  true  while  the  process  is  active.  The  direct  effects-called  direct  influences- take  t  he  form 
[*( Q,  n)  or  [-(Q,  n) 

depending  on  whether  n  is  a  positive  or  negative  contribution  to  the  derivative  of  Q.  If  a 
quantity  is  directly  influenced,  its  derivative  will  be  the  sum  of  all  the  direct  influences  on  it. 
Het liming  to  the  description  of  LIQUID-FLOW,  for  example,  we  see  that  when  an  instance  of 
LIQUID-FLOW  is  active  there  will  be  a  positive  influence  on  the  amount  of  liquid  in  the 
destination  and  an  equal,  negal  ive  influence  on  the  amount  of  liquid  ill  the  source. 

I  lie  relations  field  can  describe  new  individuals  that  are  created  by  virtue  of  the  process 
fieinv,  active  (such  as  the  steam  produced  '  boiling  water)  as  well  as  properties  needed  bv 
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representations  outside  of  QP  theory  (such  as  the  appearance  of  boiling  water).  An  especially 
important  kind  of  fact  expressed  in  the  relations  component  is  functional  dependency  between 
quantities.  Functional  dependencies  between  quantities  are  expressed  by 
Ql  xcj+  92 

(read  “Q1  is  qualitatively  proportional  to  Q2,”  or  informally,  “Q1  q-prop  Q2”),  meaning  there 
exists  a  function  which  determines  Q1  and  is  strictly  increasing  in  its  dependence  on  Q2.  Xq_ 
indicates  that  the  dependence  is  strictly  decreasing.  Note  that  qualitative  proportionalities 
express  partial  information,  since  the  exact  nature  of  the  function  relating  the  parameters  is  not 
known  and  the  function  may  or  may  not  depend  upon  other  quantities.2  If  a  quantity  Q1  is 
functionally  dependent  on  a  quantity  Q2,  and  Q2  is  influenced  by  a  process  P,  then  we  will  say 
that  P  indirectly  influences  Ql;  that  is,  when  P  is  acting  it  can  cause  Q1  to  change.  If,  for 
instance,  the  PRESSURE  and  LEVEL  of  a  liquid  are  qualitatively  proportional  to  the  AMOUNT-OF 
of  the  liquid,  then  LIQUID-FLOW  will  indirectly  influence  both  PRESSURE  and  LEVEL  because  it 
directly  influences  AMOUNT-OF.  It  is  important  to  note  that  the  only  way  a  quantity  can  change 
is  if  it  is  directly  or  indirectly  influenced.  This  means  one  can  reason  by  exclusion:  If  nothing  is 
influencing  the  amount  of  fluid  in  a  container,  then  it  isn’t  changing,  but  if  the  amount  is 
changing,  something  must  be  influencing  it.  No  changes  happen  by  themselves.  Furthermore,  we 
can  trace  the  possible  paths  of  influences  in  a  situation  and  determine  whether  or  not  particular 
kinds  of  changes  can  occur. 

Two  other  important  types  of  descriptions  should  also  be  mentioned  here.  Individual  views 
are  descriptions  used  to  represent  both  objects  whose  existence  are  subject  to  dynamical 
constraints  and  states  of  objects.  “The  water  in  a  cup,”  for  example,  is  described  as  a 
CONTA INED-LIQUID,  (see  figure  3)  because  we  can  get  rid  of  it  by  reducing  its  amount  to  zero 
(perhaps  by  making  it  the  source  of  an  instance  of  LIQUID-FLOW).  Another  example  is  a  model 
of  a  spring.  .Springs  have  three  states-relaxed,  compressed,  or  stretched-each  of  which  can  be 
modeled  by  individual  views.  Individual  views  are  specified  in  the  same  way  that  processes  are, 
in  that  they  have  individuals,  preconditions,  quantity  conditions,  and  relations.  However,  they 
do  not  have  direct  influences;  directly  influencing  quantities  is  the  sole  prerogative  of  processes. 

The  other  kind  of  description  is  the  encapsulated  history.  How  an  object  changes  through 
time  is  represented  by  its  history  (Hayes  1979b).  Histories  are  annotated  pieces  of  space-time; 
thus  they  ar“  object  centered,  have  finite  spatial  extent,  and  extend  over  time.3  As  its  name 
suggests,  an  encapsulated  history  is  a  schematized  description  of  some  fragments  of  histories  for 
a  collection  of  objects.  Encapsulated  Histories  are  useful  as  summaries  of  behavior  and  to 
directly  describe  phenomena  that  have  not  been  accounted  for  by  process  descriptions.  An 
example  of  the  latter  usage  is  describing  collisions  between  moving  objects.  A  very  simple  way  to 
model  collisions  is  to  say  that  the  very  next  thing  that  happens  after,  say,  an  object  hits  a  wall  is 
that  its  velocity  reverses  and  it  starts  moving  the  other  way.  Given  how  rapidly  collisions  occur, 
this  model  is  quite  adequate  for  most  purposes,  and  encapsulated  histories  allow  us  to  write  it 
this  wav. 


'  OP  theory  also  provides  ways  to  specify  dependence  on  properties  that  are  not  quantities  (such  as  shape,  in  re¬ 
lating  »he  level  ,,f  a  liquid  in  a  container  to  its  volume)  and  to  make  stronger  statements  about  functional  relation- 
hips.  ••ucIj  is  ''21  depends  on  22  directly,  with  no  intervening  parameteis"  and  “Q  depends  on  21  and  22  and  noth 
"ir  "h  e  when  required  for  framing  stronger  hypotheses  about  a  domain  However,  precise  specifications  of  turn  lu-us 
"  g.  O'l  '21  .ir“  not  permitted 

1  llv  contrast.  the  lassie  situational  calculus  (McCarthy  &  Haves,  1969)  description  of  change  consists  of  -itua- 
'  in  it'  'hat  describe  the  whole  universe  at  some  particular  instant  of  tune 


Framework 


0 


Forbus  &  Centner 


Figure  3.  -  A  typical  individual  view 

This  typical  individual  view  describes  a  piece  of  liquid  in  a  container,  using  the  ontology  for 
liquids  described  in  (Hayes  1979a).  there  is  is  just  “syntactic  sugar”  for  stating  that  whenever 
the  preconditions  and  quantity  conditions  are  true,  g  will  exist. 

INDIVIDUAL-VIEW  CONTAINED -LIQUID 

Individuals : 

c  a  CONTAINER 
s  a  SUBSTANCE 

Preconditions : 

CAN-CONTAIN-SUBSTANCE (c ,  s) 

Quantity-Conditions : 

A  [AMOUNT-OF-  IN  (s  ,  c)  ]  >  ZERO 

Relations : 

THERE  IS  g,  a  PIECE-QF-STUFF 
HAS-QUANTITY (g ,  AMOUNT-OF) 

AMOUNT-OF (g)  =  AMOUNT-OF- IN (s,  c) 

HAS-QUANTITY (g,  LEVEL) 

LEVEL (g)  Xn+  AMOUNT-OF (g) 

HAS-QUANTITY (g,  PRESSURE) 

PRESSURE  (g)  .-.■q+  LEVEL  (g) 


A  reasoner's  theory  of  dynamics  for  a  particular  domain  is  characterized  in  terms  of  (1)  a 
process  vocabulary  that  describes  the  kinds  of  processes  the  reasoner  believes  can  occur  and  (2)  a 
view  vocabulary  that  describes  dynamical  objects  and  relevant  states  of  objects.  All  changes  are 
assumed  to  be  directly  or  indirectly  caused  by  processes-the  sole  mechanism  assumption-which 
provides  a  strong  constraint  on  the  form  of  dynamical  theories.  Importantly,  the  content  of 
dynamical  theories  is  not  tightly  constrained-incorrect  theories  can  be  expressed  as  easily  (and 
sometimes  more  easilv!)  than  correct  theories.  For  example,  versions  of  Newtonian,  Aristotelian, 
and  Impetus  theories  oi  motion  have  all  been  encoded  using  QP  theory. 

Q)P  theory  sanctions  several  basic  deductions.  For  example,  the  kinds  of  processes  that 
might  occur  in  a  situation  can  be  determined  by  using  the  process  and  view  vocabularies  to 
••onst.rucl  instances  representing  the  different  possibilities.  The  collection  of  processes  art  ing  at, 
arty  lime  characterizes  ‘what  is  happening'1  then  in  that  situation,  and  these  processes  can  be 
found  bv  evaluating  the  preconditions  and  quantity  conditions  for  these  instances. 

1  onsider  again  the  example  in  Figure  1.  There  will  be  two  instances  of  the  LIQUID-FLOW 
process,  and  since  the  level  in  'H b  is  greater  than  wa,  the  LIQUID-FLOW  instance  representing  flow 
from  r: b  l.«>  va  will  be  active.  13 y  taking  into  account  all  of  the  influences  on  each  quantile  (culled 
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resolving  its  influences),  we  can  often  determine  the  sign  of  its  derivative.  The  sign  of  the 
derivative  is  important  because  it  represents  how  the  amount  of  the  quantity  is  changing- 
increasing,  decreasing,  or  constant.  In  this  example  there  is  only  one  process  instance  acting, 
which  makes  things  simple.  AMOUNT-OF (wb)  is  directly  influenced,  and  since  this  influence  is 
negative  it  will  decrease.  By  the  Xq+  statements  in  the  CONTAINED -LI QUID  description. 
LEVEL  (wb)  and  PRESSURE (wb)  will  De  indirectly  influenced  and  thus  will  also  decrease. 
Similarly,  AMOUNT-OF  (wa) ,  LEVEL  (wa),  and  PRESSURE (wa)  will  increase. 

From  the  ways  the  quantities  are  changing  we  can  determine  how  the  process  and  view 
structures  themselves  might  change,  since  they  depend  in  part  on  the  inequalities  stated  as 
quantity  conditions.  This  computation  is  called  limit  analysis.  In  the  example,  two  things  might 
happen-the  pressures  in  wb  and  wa  might  equalize,  or  AMOUNT-OF  (wb)  could  become  zero,  thus 
ending  wb’s  existence  (the  geometry  of  this  example  rules  out  the  latter). 

The  basic  deductions  of  QP  theory  can  be  combined  to  perform  more  complex  reasoning 
tasks.  Two  examples  of  more  complex  deductions  are  qualitative  simulation  (Forbus,  1984)  and 
measurement  ini nrelation  (Forbus,  1983;  1986).  Qualitative  simulation  consists  of  performing 
limit  analysis  repeatedly.  It  is  useful  for  making  predictions:  for  instance,  that  boiling  water  in 
a  sealed  container  could  cause  an  explosion.  Measurement  interpretation  provides  a  link  between 
physical  theories  and  observations;  for  example,  it  might  be  hypothesized  that  the  level  of  fluid  in 
a  container  is  dropping  because  the  fluid  is  flowing  out  somewhere.  Measurements  may  be 
interpreted  by  searching  through  the  space  of  process  and  view  structures,  looking  for  situations 
where  the  results  of  influence  resolution  match  the  observations  and  which  can  be  woven  together 
to  form  a  temporally  consistent  pattern  of  behavior. 

3.  Comparisons  and  structure-mapping 

So  far  we  have  considered  how  portions  of  a  person’s  knowledge  about  the  physical  world 
might  be  represented.  Let  us  now  turn  to  the  question  of  how  such  domain  models  might  be 
learned.  We  conjecture  that  a  major  process  in  experiential  learning  is  comparing  the  current 
situation  with  stored  descriptions..  Consider  for  example  a  person  who  has  just  moved  to  a  cold 
climate  and  is  learning  to  operate  a  furnace.  Suppose  that  at  first  he  wrongly  believes  that  the 
house  will  get  warm  faster  if  the  thermostat  is  set  to  a  temperature  higher  than  the  desired 
temperature.  (Kempton  (1985)  shows  that  this  view  is  quite  common.)  How  can  he  reach  the 
correct  conclusion  (hat  the  rate  of  heating  does  not  depend  on  the  temperature  setting''  There 
are  at  least  three  different  ways,  each  based  on  a  different  kind  of  implicit  comparison.  First  ,  he 
could  compare  his  past  furnace  experiences  with  each  other  and  notice  a  regularity  in  the  rate  of 
heating  that  is  independent  of  the  thermostat  setting.  Second,  he  may  compare  the  furnace 
situation  with  known  abstractions,  and  realize  that  it  is  best  described  as  a  position  action 
controller  (as  opposed  to  a  proportional -action  controller).  Third,  he  may  use  an  analogy, 
comparing  the  furnace  situation  with  a  description  from  another  domain,  such  as  fluid  flow,  to 
suggest  governing  principles.  Each  of  these  ways  of  learning  relies  on  some  form  of  comparison, 
either  with  a  stored  record  of  literally  similar  events,  with  a  stored  abstraction,  or  with  a  stored 
description  that  can  function  as  an  analogy. 

Structure  mapping  theory  is  concerned  with  such  comparisons  (see  Centner  1980,  1982. 
198,!:  Centner  ,V  Centner.  1983).  The  theory  describes  the  rules  that  are  used  to  import  a 
descriptive  structure  from  one  domain  (the  base  domain)  into  another  (the  target  domain).  The 
central  intuition  is  that  an  analogy  suggests  that  a  predicate  structure  from  one  domain  can  be 
applied  in  another  domain  with  arbitrarily  different  objects  and  surface  appearances.  Literal 
. imtlnrilg .  analogy,  mere  appearance  mappings  and  abstraction  mappings  (applications  of  general 
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laws)  are  viewed  as  different  kinds  of  mappings  between  descriptions.  The  types  of  comparisons 
are  defined  syntactically,  in  terms  of  the  form  of  the  knowledge  representation,  not  in  terms  of 
its  content.  Each  type  of  comparison  will  be  considered  in  turn. 

1.  An  analogy  is  a  comparison  in  which  relational  predicates,  but  few  or  no  object 
attributes,  are  mapped  from  base  to  target.  The  particular  relations  mapped  are  determined  by 
systematicity,  as  defined  by  the  existence  of  higher-order  constraining  relations  which  can 
themselves  be  mapped.4  Thus,  a  relational  chain  -  such  as  a  causal  chain  -  in  the  base  that 
matches  a  relational  chain  in  the  target  constitutes  good  support  for  its  members.  Winston 
(1983)  gives  an  insightful  demonstration  of  the  need  for  such  importance-dominated  matching. 
The  correspondences  between  objects  of  the  base  and  objects  of  the  target  are  determined  by  the 
roles  of  the  objects  in  the  relational  structure,  not  by  any  intrinsic  similarity  between  the  objects 
themselves. 

2.  A  literal  similarity  statement  is  a  comparison  in  which  a  large  number  of  predicates, 
both  attributes  and  relations,  can  be  mapped  from  base  to  target.  Here,  the  model  is  based  on 
one  proposed  by  Tversky  (1977),  which  states  that  the  similarity  between  A  and  B  increases  with 
the  size  of  the  intersection  of  their  feature  sets  and  decreases  with  the  size  of  the  intersection  of 
the  two  complement  sets.5  There  are  many  more  shared  predicates  than  nonshared  predicates. 

3.  An  abstraction  mapping  is  a  comparison  in  which  the  base  domain  is  an  abstract 
relational  structure.  Predicates  from  the  abstract  base  domain  are  mapped  into  the  target 
domain.  As  in  analogy,  the  mapped  predicates  are  a  relational  structure.  Abstraction  differs 
from  analogy  in  the  nature  of  the  base  domain.  There  are  almost  no  object  attributes  in  the  base, 
so  there  are  few,  if  any,  one-place  predicates  to  be  left  behind.  Applying  a  rule  to  a  situation  is 
an  example  of  abstraction  mapping.  Sometimes  the  relational  structure  so  mapped  will  also  be 
referred  to  as  an  abstraction. 

I.  A  mere  -appearance  match  is  a  comparison  in  which  the  object  attributes  match,  but  the 
relational  structure  does  not.  In  a  sense  it  is  the  opposite  of  analogy.  Such  matches  are  easily 
made;  but  they  guarantee  nothing  beyond  similarity  in  appearance. 

A  series  of  related  examples,  starting  with  the  analogy  between  heat  flow  and  water  How, 
will  illustrate  these  distinctions.  Figures  -la  and  4b  show  a  water-flow  situation  and  the 
corresponding  heat  flow  situation  (adapted  from  Buckley,  1979,  pp.  15-25).  Figure  5  shows  a 
possible  representation  a  person  might  have  of  the  water  situation.  Notice  that  the  description 
contains  both  object  attribute  predicates,  such  as  CYLINDRICAL  (beaker) ,  and  relational 
predicates,  such  as 

1  Object,  aUribipcs  are  predicates  which  take  one  object  as  an  argument,  such  as  RED ( x )  We  define  the  order 

1  .  ;>r"r>'/sitiMn  i-  ‘-.Hows  Constants  and  objects  have  order  zero  The  order  of  a  proposition  is  one  plus  'he  max- 
mum  if  ‘he  ,,r  dee-  ,f  1 1 ire  u  men  l.s  Thus  x  ;  would  be  first,  order  if  x  aud  y  are  domain  objects  md 

.  '  x  .  r  !.  would  be  second-order  Examples  of  higher-order  relations  are  AV/.  h  ind  v 

Vg  iiri  v  •  ording  ’o  Tv-r  k  v  ‘lie  negative  effects  if  the  two  complement  sets  are  not  equal,  fur  example,  giver, 
*h“  -i'i«  »ion  'How  ittniar  is  A  ’oil"  'tie  et.il!  \j  features  of  U  not.  shared  t>v  A  -  conn's  more  '  han  1  lie  -c  \ 
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Figure  I  Two  Physical  Situations  Involving  Flow 

We  will  use  i  hose  physical  situations  to  illustrate  the  kinds  of  comparisons  sanctioned  by 
strti' t  ure-inapping  theory,  and  later  to  illustrate  how  QP-style  domain  descriptions  can  be  used 
in  analogies.  Part  (a)  is  a  water-flow  situation;  part  (b)  is  the  corresponding  heal -flow  situation, 
(a) 


SREATER-THAN [PRESSURE (water,  beaker),  PRESSURE (water ,  vial)]. 
I, el,  us  consider  l  tie  comparison  types  as  exemplified  here: 
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1.  The  analogy  Heat  is  like  water  conveys  that  certain  aspects  of  the  water  description  can 
be  mapped  onto  the  heat  domain.  In  particular,  (l)  object  attributes  should  be  dropped;  (2)  some 
relational  predicates  should  be  carried  over;  and  (3)  systematicity  determines  which  relations 
should  be  mapped.  Thus, 

CYLINDRICAL (beaker) 

is  dropped,  along  with  other  object  attributes;  that  is,  the  target  objects  do  not  have  to  resemble 
their  corresponding  base  objects.  Some  relations  are  carried  across,  such  as, 

GREATER-THAN [PRESSURE (water,  beaker),  PRESSURE (water,  vial)]. 
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Yet  not  all  relations  are  carried  across.  By  the  systematicity  principle,  this  GREATER- 
THAN  [PRESSURE  (water ,  beaker)  ,  PRESSURE  (water,  vial)]  relation  is  preserved  because  it 
is  part  of  the  mappable  chain  governed  by  the  higher-order  relation  IMPLIES.  In  contrast,  the 
relation 

GREATER-THAN [CROSS-SECTIONAL-AREA (beaker) ,  CROSS-SECTIONAL-AREA (vial) ] } 
is  not  carried  across,  since  it  is  not  part  of  any  mappable  system  of  constraining  relations  in  this 
representation  of  the  base  domain. 


Figure  6  shows  the  representation  in  the  target  domain  of  heat-How  that  results  from  the 
analogical  mapping.  Given  the  object  correspondences  heat/water,  beaker/coflee,  vial/ice  cube, 
pipe  bar,  and  PRESSURE/TEMPERATURE,8  systematicity  operates  to  enforce  a  tacit  preference  for 
coherence  and  predictive  power.  The  systematic  relational  structure  in  the  water  domain 
IMPLIES (GREATER-THAN [PRESSURE (water,  beaker) , 

PRESSURE (water ,  vial)], 

FLOW (water,  pipe,  beaker,  vial)) 
is  mapped  into 

IMPLIES [GREATER-THAN [TEMPERATURE (heat,  coffee) , 

TEMPERATURE  (heat ,  Ice  cube)]  , 

FLOW (heat,  bar,  .  lee  cube)]. 

Note  that  the  systematicity  principle  requires  a  mappable  relational  chain.  If  a  particular  chain 
of  higher-order  relations  in  of  the  base  chain  is  not  valid  in  the  target,  then  another  chain  is 
selected.  For  example,  suppose  that  we  keep  the  same  base  domain  -  the  system  of  containers 
shown  in  Figure  5  -  but  change  the  target  domain.  Suppose  the  two  target  objects  are  identical 
in  temperature,  but  differ  in  their  specific  heats:  say,  a  metal  ball-bearing  and  a  marble  of 
equal  mass.  Now,  the  natural  analogy  concerns  capacity  differences  in  the  base,  rather  than 
pressure  differences.  This  is  because  the  deepest  relational  chain  that  can  be  mapped  to  the 
(arget  now  concerns  the  situation  in  which  pressures  are  equal  in  the  base  domain  (analogously  to 
temperatures  being  equal  in  the  target  domain): 

IMPLIES [GREATER-THAN [CROSS-SECTIONAL-AREA  (beaker) , 

CROSS-SECTIONAL -AREA  (vial)] , 

GREATER-THAN  [AMOUNT-OF-WATER  (beaker) , 

AMOUNT-OF-WATER  (vial)]] 

This  carries  over  into  the  target  as 

IMPLIES [GREATER-THAN [HEAT-CAPACITY  (marble) , 

HEAT-CAPACITY  (ball-bearing)] , 

GREATER-THAN [AMOUNT -OF -HEAT  (marble)  . 

AMOUNT-OF-HEAT  (ball-bearing)]] . 

That  is,  given  the  same  height  (pressure)  the  container  with  a  larger  area  will  hold  more  water. 
Analogously,  at  the  same  temperature  the  object  with  greater  heat  capacity  will  hold  more  heat. 
Thus  the  interpretation  of  an  analogy  depends  on  the  best  relational  match  between  base  and 
target. 


i 


'  In  '.his  inaloi ;y.  the  first-onler  predicate  of  PKEJJURE  in  the  water  domain  must  be  replaced  by  '■ Vi  L.b  A  '  ' 

.a  'he  teat  domain  Although  systems  of  relations  can  ofteu  be  imported  into  the  target  without  change,  substitutions 
d  function-,,  c  well  as  objects  and  t.beir  attributes,  are  sometimes  made  in  order  to  permit  mapping  a  larger  sys- 
1  "in ate  -  li ai n 
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Figure  5  -  A  representation  of  the  water  situation 

This  network  represents  a  portion  of  what  a  person  might  know  about  the  water  situation  illus¬ 
trated  in  the  figure  4.  In  this  and  other  figures,  predicates  are  written  in  upper  case  and  circled. 
Objei '  -  are  written  in  lower  case  and  uncircled.  A  simplified  representation  is  used  to  illustrate 
the  ru"  -<  of  analogy.  A  more  detailed  model  will  be  shown  later. 
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Figure  6  -  A  representation  of  the  heat  situation  that  results  from  the  heat/water  analogy 
This  network  represents  the  knowledge  a  person  would  map  across  into  the  heat  domain  from  the 
water  situation  illustrated  in  the  previous  figure.  As  in  that  figure,  a  simplified  representation  is 
us*  d  In  n  .  A  more  detailed  treatment  of  this  analogy  is  presented  later. 


2.  The  literal  similarity  comparison  Kool-Aid  is  like  water  conveys  that  most  of  the  water 
description  can  be  applied  to  Kool-Aid.  In  literal  similarity,  both  object  attributes,  such  as 
7/ET (water),  and  relational  predicates,  such  as  the  systematic  chain  discussed  above,  are 
mapped  over. 

The  abstraction  llr.nl  is  a  through  -variable  migh'  1  <•  available  to  a  student  who  knows 
mine  system  dynamics.  This  abstraction  conveys  the  idea  that  heat  can  be  thought  of  as 
omelhing  that  Hows  across  a  dillerence  in  potential  (i.e.,  aero  some  sort  of  “across  -  variable  - 
in  this  case,  temperature).  This  is  much  the  same  relation. d  structure  as  conveyed  by  the 
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analogy  above;  the  difference  is  that  in  the  abstract  base  domain  of  through-variables  and 
across-variables  there  are  no  concrete  properties  of  objects  to  be  left  behind  in  the  mapping. 

4.  A  mere-appearance  match  is  a  match  with  overlap  chiefly  in  lower-order  predicates, 
such  as  object-attributes,  but  little  or  no  relational  match.  An  example  is  The  tabletop  sparkled 
like  water.  Such  a  match  typically  yields  little  or  no  useful  information  about  the  target;  here, 
for  example,  little  can  be  learned  about  the  table  by  mapping  across  knowledge  about  water. 
These  matches,  however,  cannot  be  ignored  in  a  theory  of  learning  because  a  novice  learner  may 
be  unable  to  tell  them  from  true  literal  similarity  matches. 

Table  1  summarizes  'he  kinds  of  predicate  overlap  that  characterize  literal  similarity, 
analogy,  abstraction,  and  mere  appearance  matches,  as  well  as  one  other  kind  of  comparison, 
anomaly.  An  anomaly  is  a  match  with  little  or  no  predicate  overlap;  it  is  included  simply  for 
completeness. 

It  should  be  clear  that  the  contrasts  described  here  are  continuua,  not  di<  In-  ni  I  <>■■■ 
example,  analogy  and  literal  similarity  lie  on  a  continuum.  Given  that  two  domains  o  .  clap  in 
relational  structure,  then  the  comparison  becomes  more  a  literal  similarity  match  to  the  extern 
that  their  object  attributes  also  overlap,  and  more  an  analogy  to  the  extent  that  few  or  no  object, 
attributes  overlap.  A  different  sort  of  continuum  exists  between  analogies  and  general  laws.  In 
both  cases,  a  relational  structure  is  mapped  from  base  to  target.  If  the  base  representation 
includes  concrete  objects  whose  individual  attributes  must  be  left  behind  in  the  mapping,  the 
comparison  is  an  analogy.  As  the  object  nodes  of  the  base  domain  become  more  abstract  and 
variable-like,  the  comparison  is  seen  as  a  general  law. 


4.  Structure-Mapping  and  Learning 

The  role  of  a  comparison  in  learning  depends  on  at  least  two  things:  (1)  accessibility  -  the 
likelihood  that  the  match  will  be  noticed,  and  (2)  usefulness  -  what  can  be  deduced  from  the 
match  if  it  is  accessed.  Accessibility,  in  turn,  depends  at  least  on  (a)  the  familiarity  of  the  base 
description  and  (b)  the  overall  similarity  between  the  base  description  and  the  current  target. 
The  immediate  usefulness  of  a  match  depends,  of  course,  on  whether  the  content  of  the  match  is 


1  able  1  Types  of  Comparisons 

ATTRIBUTES  RELATIONS 

l.iteral  Similarity  Many  Many 

Analogy  Few  Many 

\bstrartion  Few  Many 

\nomalv  Few  Few 

Mere  Appearance  Many  Few 


EXAMPLE 

Milk  if  like  w.-**  ■ 

Heat  is  like  war.  r 

Heat  n  a  through  variable 

Coffee  is  like  the  solar  system 

The  glass  tabletop  gleam-  d  like  a  pool  of  water 
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appropriate  to  the  task  at  hand.  In  addition,  the  usefulness  of  a  match  depends  on  the 
mspectability  of  the  matching  content  -  the  degree  to  which  it  can  be  consciously  analyzed  and 
articulated.  The  comparisons  discussed  above  behave  very  differently  with  respect  to 
accessibility  and  inspectability. 

For  novice  learners,  literal  similarity  matches  are  the  most  accessible  comparisons  and 
abstractions,  because  they  are  typically  extremely  unfamiliar,  are  the  least  accessible.  In 
contrast,  abstraction  matches  are  far  more  inspectable  than  literal  similarity  matches.  On  both 
dimensions,  analogies  are  intermediate.  This  is  one  reason  that  analogy  is  crucial  in  learning:  it 
is  the  novice’s  best  route  to  an  abstract,  inspectable  data  structure.  Some  evidence  for  these 
conjectures  will  now  be  reviewed. 

Surface  matches  are  highly  accessible.  This  includes  both  literal  similarity  matches  and 
mere-appearance  matches.  It  has  been  shown  in  education  and  training  literature  that  the  more 
similar  the  new  situation  is  to  the  original  situation  the  more  readily  transfer  of  training  occurs 
(cf.  Brown  &  Campione,  1985:  Ross,  1984).  The  term  “generalization  gradient’’  expresses  the 
fact  that  a  learned  response  generalizes  more  readily  the  more  similar  the  new  situation  is  to  the 
original  situation.  In  contrast,  subjects  are  often  quite  slow  to  use  an  available  analogy.  In 
research  done  by  Reed,  Ernst  8c  Banerji  (1974)  and  later  by  Gick  and  Holyoak  (1980,  1983), 
subjects  were  asked  to  solve  a  rather  difficult  problem,  such  as  how  to  cure  an  inoperable  tumor 
with  radiation  without  killing  the  flesh  along  the  path  of  the  rays.  Just  prior  to  receiving  the 
problem  some  of  the  subjects  read  material  that  contained  an  analogous  solution,  such  as  a  story 
about  a  general  who  split  his  troops  up  so  that  they  all  converged  simultaneously  on  a  fortress  he 
wished  to  capture.  There  are  three  interesting  results  here.  First,  a  good  analogy  can  be  very 
powerful,  if  it  is  noticed.  Without  the  analogy,  only  about  10%  of  the  subjects  could  solve  the 
problem.  Once  the  experimental  subjects  were  told  to  use  the  prior  story  as  an  analogy,  80  to  90 
percent  of  them  solved  the  problem  correctly.  Second,  a  potentially  powerful  analogy  can  easily 
go  unnoticed.  Before  the  analogy  was  pointed  out,  only  about  a  third  of  the  subjects 
spontaneously  noticed  and  used  it.  It  cannot  be  taken  for  granted  that  a  potential  analog  will  be 
spontaneously  noticed  and  used.  Third,  literal  similarity  is  far  more  accessible  than  true 
analogy.  In  one  of  their  studies,  Gick  and  Holyoak  (1983)  happened  to  set  up  a  literal  similarity 
match  between  the  story  and  problem.  Subjects  had  to  solve  a  problem  involving  tying  two 
ropes  together,  and  the  story  they  were  given  involved  tying  two  ribbons  together.  In  this  case, 
70  to  80  percent  of  the  subjects  were  able  to  access  the  matching  story  spontaneously.  In  a 
systematic  series  of  studies,  Ross  (1984)  varied  the  surface  similarity  between  problems  subjects 
were  taught  and  later  problems  they  had  to  solve  and  found  that  subjects  were  much  more  likely 
to  be  reminded  of  problems  with  similar  surface  content. 

There  is  developmental  evidence  that  literal  similarity  and  mere-appearance  matches 
appear  prior  to  analogies  and  abstraction  matches  in  learning.  Kemler  (1983)  has  found  that 
young  children  group  objects  on  the  basis  of  overall  similarity  in  situations  where  adults  would 
group  more  analytically,  using  a  single  dimension.  Keil  and  Batterman  (1984)  compared 
children's  meanings  and  adults’  and  found  a  “characteristic -to-defining  shift.”  For  example,  in 
defining  “island"  preschoolers  use  such  surface  features  as  “having  palm  trees  and  beaches"  or  “a 
warm  place.’  Adults  rely  on  defining  features  such  as  “surrounded  by  water."  Another  example 
occurs  in  labeling  early  word  learning.  In  spontaneous  labeling,  one -year  old  children  frequently 
1  pply  words  to  objects  that  closely  resemble  the  original  referent  of  the  word:  lor  example  dojqie 
will  be  applied  to  another  dog  or  to  a  cat,  and  car  to  cars,  trucks  or  other  vehicles  ((  'lark.  1973). 
1  roly  analogous  usages  are  seldom  heard  until  the  age  of  two  or  three  years,  when  for  example,  a 
t  hree  year  old  child  might  remark  about  his  dirty,  bedraggled  blanket,  ‘It  s  out  of  gas  "  |(  font  nor 
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Xr  Stuart.  1981,  Winner  1979). 

Children  are  said  to  move  from  rich,  concrete  representations  to  more  abstract,  rule  bast'd 
systems  (Bruner.  Olver  it  Greenfield  1966;  Gibson,  1969).  Even  three-year-olds  can  sort  objects 
into  perceptually  similar  categories;  for  example,  they  can  group  a  cat  and  a  dog  and  exclude  a 
hen.  However,  not  until  they  are  live  or  six  years  can  they  succeed  with  a  more  abstract 
category,  such  as  “living  thing"  which  requires  grouping  perceptually  dissimilar  things.  In  the 
same  vein,  research  on  the  novice-expert  shift  in  adult  learning  has  demonstrated  that  whereas 
novice  science  students  typically  match  situations  on  the  basis  of  surface  features,  experts  use 
deeper  and  more  abstract  criteria  (Larkin,  1983).  For  example,  Chi,  Feltovich,  and  Glaser  (  1981  I 
have  shown  that  when  novice  physics  students  are  asked  to  classify  problems  into  similar  groups 
they  put  together  problems  with  similar  surface  features,  such  as  “inclined  planes"  or  “pulleys.” 
Experts,  on  the  other  hand,  use  categories  like  “force  problems”  and  “energy  problems.” 

One  final  indication  of  the  ease  with  which  literal  similarity  matches  are  made  involves  an 
indirect,  but  very  important,  line  of  argument.  In  the  realm  of  object  concepts,  there  is  some 
evidence  that  people  automatically  perform  literal  similarity  comparisons  among  perceptually 
similar  experiences.  Such  comparisons  are  thought  to  result  in  composite  prototypes  (see  Posner 
,V  Mitchell,  1967;  Bosch,  1973,  1975,  1978;  Smith  &  Medin,  1981).'  In  the  Posner  <k  Mitchell 
study,  people  classified  dot  patterns  into  categories.  After  they  had  sorted  the  patterns,  they 
were  asked  to  remember  which  patterns  they  had  seen.  Although  the  task  simply  called  for 
accessing  verbatim  memory,  subjects  showed  systematic  misrecognitions:  they  falsely 
remembered  having  seen  prototypical  patterns  that  were  never  presented.  Thus,  without  being 
told  to  do  so,  people  formed  composite  mental  representations,  apparently  based  on  implicit 
comparisons  among  the  patterns  that  they  saw.  Even  theories  which  rely  exclusively  on  stored 
exemplar  information  (such  as  that  of  Medin  and  Schaffer,  1978)  share  the  assumption  that 
literal  similarity  matches  are  made  easily  -  indeed,  automatically.  The  difference  is  that  they 
assume  that  these  implicit  comparisons  are  made  at  the  time  of  use  of  the  stored  exemplars, 
rather  than  assuming  that  the  exemplars  are  encoded  into  a  composite  prototype.  The  virtually 
automatic  nature  of  basic  category  learning  is  further  evidence  that  the  literal  similarity  matches 
on  which  they  are  based  arc  highly  accessible  -  indeed,  evidence  that  making  such  comparisons  is 
a  passive,  essentially  automatic  process  (see  also  Reber  1967a,  1967b). 

However,  prototypes  also  illustrate  the  limited  usefulness  of  literal  similarity  matches. 
Although  these  implicit  composites  are  often  sufficient  for  recognizing  and  categorizing 
situations,  they  are  of  limited  use  in  deriving  causal  principles.  This  is  because  (1)  a  match  based 
largely  on  perceptual  commonalities  will  often  fail  to  contain  the  correct  principles,  and  ('J|  even 
when  some  of  the  correct  relations  are  present,  literal  similarity  matches  are  too  rich  to  he 
inspectable.  There  is  some  evidence,  albeit  indirect.,  for  this  notion  of  rich,  noninspect  able 
representations.  Nickerson  and  Adams  (1979)  studied  people’s  memory  of  the  common  penny  . 
I  Respite  the  overwhelming  amount  of  experience  that  the  subjects  have  had  with  pennies,  and 
despite  their  evident  ability  to  recognize  and  categorize  pennies,  they  were  remarkably  poor  at 
recalling  or  recognizing,  given  close  near  misses,  the  details  of  how  pennies  look.  This 
demonstrates  that  possessing  a  desc  ription  sufficient  to  pick  out  a  class  of  objects  in  ordinary  life 
i.  no  guarantee  that  the  description  can  be  articulated, or  that  it,  is  verv  precise. 

studies  of  young  children  -how  that  overall  similarity  judgments  can  be  difficult  to 
■n  'ompo  o.  \  di  iM-a  d  altou  .  young  children  appear  to  base  their  similarity  judgments  on 
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some  kind  of  overall  similarity  (Kemler,  1983).  Indeed  Shepp  (1978)  has  found  that  three-  and 
four-year-olds  are  typically  unable  to  judge  one  dimension  independently  of  another.  For 
example,  they  cannot  ignore  height  when  judging  width.  Unlike  adults,  they  are  unable  to  treat 
length  and  width  as  separable. 


Abstraction  matches  are  at  the  opposite  pole  from  literal  similarity.  An  abstraction  match 
is  likely  to  be  extremely  useful,  in  both  respects:  it  should  contain  the  correct  principle,  and  the 
match  should  be  inspectable.  But  abstractions  are  often  not  particularly  accessible,  especially  for 
novices.  Novice  learners  may  not  know  the  appropriate  abstraction,  or  it  may  be  so  unfamiliar 
that  they  will  not  retrieve  it  when  appropriate.  Thus  abstraction  mappings,  while  ultimately 
important,  are  unlikely  to  play  a  major  role  in  the  early  stages  of  learning. 

Analogies  lie  between  the  highly  accessible  literal  similarity  matches  and  the  highly  useful 
abstraction  matches.  Potential  analogies  arc  less  accessible  in  experiential  learning  than  literal 
similarity  matches  (Centner  Sc  Landers,  1985;  Ross,  1984).  This  is  because  analogy  requires 
accessing  the  learner’s  data  base  via  relational  matches;  object  matches  are  of  little  or  no  use. 
However,  once  found,  an  analogy  should  be  more  useful  than  a  literal  similarity  match  in 
deriving  the  key  principles,  since  the  shared  data  structure  is  sparse  enough  to  permit  analysis. 
(Of  course,  educators  often  explicitly  introduce  analogies  in  teaching  beginners  for  exactly  this 
reason.  In  this  case,  the  problem  of  noticing  the  analogical  match  is  bypassed.  Moreover,  by  the 
systeinaticity  principle,  the  set  of  overlapping  predicates  is  likely  to  include  higher-order 
relations,  such  as  causality  and  logical  implication.  Thus  analogy  can  function  to  reveal  principles 
in  a  domain  that  previously  lacked  the  appropriate  abstractions  (Burstein,  in  press;  Carbonell, 
1981,  in  press;  Clements,  1982;  Darden,  in  press;  Gentner  1980,  1982;  Gentner  &  Gentner,  1983; 
Gick  Sc  Holyoak,  1983;  IIoITman,  1980;  Van  Lehn  Sc  Brown,  1980) 

The  Analogical  Shift  Hypothesis  (Gentner,  19815 '  concerns  the  role  of  these  comparisons  in 
experiential  learning,  (n  the  earliest  stages  most  ol  the  spontaneous  matches  are  either  mere- 
appearance  matches  (and  thus  erroneous)  or  are  literal  similarity  matches,  based  on  massive 
feature  overlap.  This  is  to  say  that  initial  learning  is  conservative,  based  on  rich,  specific  case 
kinds  of  matches.  As  the  domain  becomes  familiar,  more  distant  comparisons  begin  to  occur; 
matches  in  which  fewer  object  attributes  are  shared.  These  sparse  comparisons  lead  to  the  kinds 
of  binary  connections  that  form  the  bulk  of  the  causal  corpus  -  for  example,  “lighter  things  go 
farther  when  thrown.”  Analogy  also  serves  as  a  means  of  introducing  structured  mental  models. 
Successful  analogies  may  yield  abstractions  which  can  be  stored  and  accessed  (Gick  Sc  Holyoak. 
1983).  Winston’s  system  (see  Winston  1 980;  Winston  1982),  which  derives  if-then  rules  In- 
abstracting  the  predicates  common  to  two  analogs,  shows  how  this  can  be  done.  Thus,  analogy 
plays  an  important  role  in  the  middle  and  later  stages  of  learning.  In  the  final  stages,  when 
learning  is  well  advanced,  abstraction  mappings  play  a  major  role. 


5.  Stages  of  Understanding 

We  suspect  that  four  kinds  of  mental  models  are  generated  in  the  process  of  understanding 
physical  domains.  The  sequence  of  models  proposed  here  is  developmental,  in  that  t  ho  theories  of 
h  it  age  ire  generated  both  by  the  phenomena  being  understood  and  by  the  theories  of  i  he 


•  tage  before  it.  Ii  is  not  proposed  that  every  person  goes  through  each  stage  for  even  domain, 
nor  that  a  per  on  is  at  the  same  stage  in  every  domain  at  the  same  time. 
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5.1.  Stage  1:  Protohistories 

Suppose  some  new  physical  phenomenon  is  being  observed.  If  there  is  no  prior  model,  all 
one  can  do  is  observe  and  remember  what  is  happening.  We  conjecture  that  the  simplest 
physical  models  of  a  domain  are  protohistories  -  prototype  histories  which  serve  as  summaries  of 
experience.8  Like  object  prototypes,  protohistories  are  the  “most  typical  instances'-  of 
phenomena.  The  terms  in  these  descriptions  are  observables,  and  their  deductive  import  can  be 
roughly  expressed  as  “If  1  see  X,  then  Y  will  happen  (has  happened).” 

Consider  a  balance  beam  or  seesaw.  If  a  weight  is  placed  on  each  side  of  the  fulcrum,  the 
seesaw  will  either  tilt  counterclockwise,  tilt  clockwise,  or  not  tilt  at  all.  Most  people  have  had 
enough  experiences  with  seesaws  to  have  formed  protohistories  concerning  their  behavior.  By  the 
conjecture  described  here,  a  protohistory  is  automatically  available  whenever  they  encounter  a 
seesaw.  From  it,  they  can  often  predict  which  way  the  particular  seesaw  will  move.  For 
example,  they  may  have  a  protohistory  that  describes  what  happens  if  a  small  person  gets  on  the 
seesaw  opposite  a  large  person. 

However,  the  predictive  power  of  protohistories  is  quite  limited.  There  is  no  guarantee  that 
the  features  matched  actually  correspond  to  relevant  factors.  For  example,  an  observer  will  be 
fooled  when  a  large  person  sits  close  to  the  fulcrum  if  the  observer’s  see-saw  protohistories  have 
been  formed  from  watching  people  sitting  at  equal  distances.  Massive  overlap  in  features  is 
needed  for  reliable  use,  which  means  protohistories  will  yield  conclusions  in  fewer  situations  than 
an  abstract  theory  would.  Consider,  for  example,  two  weights  hung  from  opposite  ends  of  a  stick 
that  is  suspended  by  a  string.  The  principle  involved  is  the  same,  yet  the  situations  look 
dissimilar  enough  that  the  protohistories  for  seesaws  would  not  match.  Furthermore,  there  is  no 
certain  way  to  decide  between  conflicting  results  if  more  than  one  protohistorv  matches  a 
'.it  uat  ion.g 

5.1.1.  Learning  Protohistories 

l'he  process  of  constructing  protohistories  involves  dividing  up  experience  into  classes 
according  to  literal  similarity  and  abstracting  a  summary  for  each  class.  There  has  been  little 
direct,  research  on  this  process.  However,  investigations  into  the  process  of  constructing  object 
prototypes  provides  some  hints.  First,  people  seem  to  be  able  implicitly  ( i.e. ,  unconsciously)  to 
compute  a  kind  of  component  match.  Second,  this  intersection  is  not  merely  a  simple  feature 
intersection;  rather,  it  appears  that  configurations  among  features  are  important  in  the 
prototype.  Third,  once  this  prototype  is  computed,  it  has  powerful  effects  on  subsequent 
processing  of  experience.  As  mentioned  previously,  once  people  abstract  a  prototype  from  a  set 
nl  patterns  they  may  be  more  confident  of  having  seen  the  prototype  -  which  was  never 
presented  than  they  are  of  having  seen  the  patterns  actually  presented  (Posner  19(i7).  Finally, 
people  may  not  be  aware  of  forming  prototypes,  except  as  a  general  sense  of  increased  familiarity 
a  1 1  li  a  cat  egor  v . 

In  summarize,  if  protohistories  behave  like  object  prototypes,  then  they  should  be  found  to 
'll  be  e,, |, iputei  1  implicitly;  (•_’)  act  as  composite  concepts;  (3)  be  sensitive  to  perceptual 
'  outigur.il  ion-,  among  events:  and  (•))  once  computed,  show  the  recognition  strength  and  other 
p- vchoiogtc.il  privileges  ol  prototypes. 
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The  machine  learning  research  that  most  closely  captures  this  type  of  learning  is  concerned 
with  conceptual  clustering  (see  Michalski  &  Stepp  1983,  1983b).  So  far,  such  research  has  focused 
on  classifying  objects  that  can  characterized  mainly  by  differing  attributes.  Extending  such 
techniques  to  describe  situations  that  depend  critically  on  relational  descriptions  could  provide  a 
method  for  computing  protohistories. 

5.2.  Stage  2:  The  Causal  Corpus 

Protohistories  summarize  the  phenomena,  but  they  do  not  constitute  a  theory  of  it. 
Building  a  detailed  theory  directly  can  be  quite  difficult.  The  space  of  possible  models  connecting 
all  observable  (and  possible)  parameters  in  a  typical  situation  can  be  quite  large.  We  conjecture 
thaL  weaker  theories,  theories  that  characterize  which  parts  of  the  situation  are  relevant  to 
desired  conclusions  are  formed  first.  In  particular,  we  conjecture  that  a  collection  of  CAUSE 
statements,  the  causal  corpus,  is  computed  from  prototype  objects  and  protohistories. 

CAUSE  is  viewed  here  as  an  approximation  concept,  a  weak  form  of  ontological 
commitment.  In  particular,  saying 
CAUSE (A,  B) 

expresses  belief  in  the  existence  of  some  mechanism,  specified  by  some  theory  T,  such  that 
AAT-B 

Many,  perhaps  most,  of  the  causal  corpus  relations  are  binary  relations  among  variables  -  for 
example,  “Bigger  objects  weigh  more.”  (Piaget,  1951;  Smith,  Wiser  8c  Carey,  in  press),  or 
“Smaller  objects  have  higher  pitch  when  struck.”  (diSessa,  1983). 

The  notion  of  mechanism  in  the  causal  corpus  is  quite  primitive:  the  causal  beliefs  need  be 
neither  explicit  nor  internally  consistent.  Later  in  the  learning  sequence,  as  we  will  see,  processes 
will  assume  the  role  of  mechanisms  for  physical  domains.  Nevertheless,  we  conjecture  that,  even 
at  this  early  stage,  the  learner  makes  a  distinction  between  mechanistic  connections  and,  say. 
definitional  connections.10  Further,  we  suspect  that  many  of  the  initial  causal  connections  are 
incorrect.  Novices  often  include  diagnostic  and  correlational  relations  in  their  causal  corpus.  For 
example,  when  asked  if  an  increase  in  the  evaporation  rate  will  cause  a  change  in  the  temperature 
of  the  water,  a  novice  may  reply  "Yes,  because  it  would  have  to  be  hotter  to  evaporate  more." 
But  however  vague  or  confused,  a  causal  attribution  is  a  statement  of  belief  in  some  mechanistic 
connection. 

'Fhe  distillation  of  experience  from  protohistories  into  the  causal  corpus  serves  three 
purposes.  First,  it  serves  as  a  means  of  data  reduction.  Second,  it  provides  a  collection  of 
heuristics  that  can  be  used  directly  to  draw  inferences.  Even  if  the  learner  doesn't  have  firm 
grounds  to  consider  the  CAUSE  statements  complete  or  correct,  CAUSE  statements  may  often 
suffice  for  the  desired  class  of  inferences.  Third,  the  collection  of  CAUSE  relations  can  be  used  to 
guide  the  search  for  a  deeper  theory  of  the  domain.  The  CAUSE  statements  suggest  connections 
among  various  aspects  of  the  domain  which  a  deeper  theory  must  either  explain  or  explain  away. 

Returning  to  the  seesaw  example,  suppose  the  causal  corpus  is  now  applied  to  a  balance 
beam  built,  out  of  blocks.  Suppose  the  two  blocks  on  it  are  called  a  and  b.  The  causal  corpus 
might  be  as  follows: 

CAUSE (BIGGER (a,  b)  ,  TILT-TOWARDS (a) ) 

1  !•' - 1 r  example  ’lie  ,t:ii einent  Mow  is  not  a  legitimate  use  of  ;A'..‘Cs.  by  our  account,  since  the  requir'd  axioms 
i  "•iin.  irv  lo  'i>ji  i" 'itv  i  me,  harnsiii 

GAUGE UH [ ANGLE  ( f )  ,  HAS-THREE-3IDES  (  f ) ) 
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CAUSE (FARTHER (a,  b)  ,  TILT-TOWARDS(a) ) 

These  statements  can  be  interpreted  as  rules  in  several  ways:  If  we  see  that  block  a  is  bigger 
than  block  b,  one  can  predict  tilt,  and  if  one  sees  tilt,  one  may  hypothesize  that  one  block  is 
farther  out  than  another.  I  1  statements  are  more  broadly  applicable  than  protohistories 
since  they  refer  to  fewer  proper)  n-  .  Unlike  protohistories,  the  causal  corpus  is  sparse  enough  to 
be  debugged  to  some  degree. 

However,  the  approximate  nature  of  the  CAUSE  relation  and  the  binary  characteristic  of  the 
laws  limit  the  ability  to  discriminate  between  conflicting  predictions.  With  the  causal  corpus 
above,  for  instance,  if  block  a  is  bigger  and  block  b  is  farther  out,  we  will  have  two  predictions. 
Inhelder  &  Piaget  (1958)  and  Hiegler  (1976,  1981)  have  documented  such  a  stage  in  the 
development  of  understanding  about  the  balance  beam  (with  analogous  developmental  sequences 
in  other  domains).  Typically,  children’s  first  causal  approach  to  the  problem  is  to  focus  on 
weight.  But  there  is  an  interesting  second  stage  when  In  y  come  to  realize  that  both  weight,  and 
distance  are  important  but  do  not  yet  know  the  interrelations.  They  can  manage  either  property 
bv  itself  if  the  other  is  constant;  but  if  both  properties  vary,  they  tend  to  focus  on  one  or  the 
other  inconsistently.  It.  is  as  though  they  had  two  separate  binary  laws.  Eventually,  they  become 
able  to  coordinate  weight  and  distance  in  the  balance  beam  problem.  At  this  stage,  they  have 
gone  beyond  the  causal  corpus.  As  will  be  discussed,  in  order  to  make  more  precise  inferences  the 
learner  must  eventually  uncover  the  mechanisms  whose  behavior  is  described  by  causal  corpus. 

5.2.1.  Learning  the  Causal  Corpus 

We  suspect  that  there  are  several  techniques  for  computing  and  debugging  a  causal  corpus. 
The  simplest  technique  is  to  hypothesize  causality  from  co-occurrence,  using  rules  like: 

If  you  always  see  A  before  B,  then  hypothesize  CAUSE  (A,  B) 
ami 

If  A  Is  true  whenever  B  is  true  ,  then  hypothesize  CAUSE  (A,  B) 

These  rules  make  certain  assumptions  on  the  form  of  memory,  namely  that  some  number  of 
circumstances  can  be  remembered,  and  that  they  can  be  remembered  in  sulficient  detail  that  A 
arid  B  are  either  explicitly  stored  or  computable  from  what  is  stored.  Protohistories  should  serve 
as  a  means  of  initial  data  reduction  from  which  a  causal  corpus  can  be  constructed. 

It  is  not  clear  exactly  how  the  learner  abstracts  out  particular  variables  from  the  rich 
representations  of  the  protohistory  stage.  One  interesting  mechanism  is  suggested  by  Medin  and 
Wattenmaker’s  (in  press)  extension  of  the  context  theory  (Medin  &  Schaffer,  1978).  They  suggest 
an  abstraction  mechanism  whereby  a  similarity  match  which  leads  to  correct  predictions  results 
in  common  information  being  augmented;  whereas  if  a  similarity  match  gives  wrong  predict  ions, 
the  differences  are  augmented.  However  this  is  done,  the  simplification  achieved  with  t  he  causal 
corpus  is  considerable.  Another  study  by  Sieglor  (1978)  shows  the  power  of  focusing  on 
particular  variables.  Three  year  old  children  were  shown  a  balance  beam,  asked  to  predict 
which  way  it,  would  tilt,  arid  then  shown  what  actually  occurred.  Even  after  large  numbers  of 
trials,  their  performance  failed  to  improve.  But,  when  they  were  taught  to  think  of  the  domain  in 
terms  of  a  few  relevant  variables  weight  and  length  -  their  performance  did  improve  with 
experience.  The  moral  to  be  drawn  is  that  the  pace  of  learning  is  greatly  accelerated  when  a 
mall  number  of  variables  can  be  abstracted  from  all  the  possibly  relevant  factors. 

\s  suggested  earlier,  many  of  the  early  causal  relations  will  be  incorrect.  We  suspect  ih.it 
'leve  exists  a  class  ol  rules  which  are  used  to  debug  a  causal  corpus  ill  the  face  of  new 
mlormat  ion  (c.l.  'snssman.  19761.  Each  rule  corresponds  to  a  hvpolhesis  about  a  hue.  in  the 
t  •  'll  I  lire  ol  I  lie  causal  corpus,  sill'll  as  a  missing  precondition.  We  believe  that  tile  task  of 
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judging  a  causal  corpus  for  consistency  :s  an  example  of  an  important,  but  relatively  neglected, 
kind  of  learning,  coherence-driven  learning.  Coherence-driven  learning  is  learning  that  is  driven 
not  by  a  mismatch  between  the  model  and  the  world  but  by  inconsistencies  within  the  model 
itself.  Williams,  Hollan,  Sc  Stevens  (1983)  found  evidence  of  such  learning.  They  studied  a 
subject  who  was  learning  about  a  heat  exchanger,  and  noted  that  one  source  of  insight  was  a 
"boggle”  experience,  in  which  the  person  noticed  that  a  current  inference  contradicted  a  prior 
belief.  We  are  still  examining  the  criteria  for  judging  the  consistency  of  a  causal  corpus.11  Such 
criteria  will  play  a  major  role  in  controlling  the  debugging  rules  and  in  the  mixture  of  generation 
and  debugging  that  occurs. 

Analogy  provides  another  important  technique  for  extending  a  causal  corpus  (see  Centner  Sc 
otner,  1983;  Stevens.  1979).  The  CAUSE  relations  from  one  domain  can  be  mapped  into 
ipoiher,  since  CAUSE  qualifies  as  a  higher-order  constraining  relation  (see  also  Winston.  198'J). 

r». 3.  Stage  3:  Naive  Physics 

The  naive  physics  models  replace  CAUSE  statements  with  theories  about  the  specific 
mechanisms  of  change.  The  ontology  is  extended  by  adding  processes  to  explain  observed 
i  hanges.  I'he  ontology  a  I  <■  includes  properties  and  objects  that  are  not  directly  observable  (for 

example,  heat  and  . .  Mow)  and  the  new  relationships  (such  as  fluid  path  and  heat  path) 

required  to  reason  abo'  hem. 

An  important  advantage  of  these  models  is  the  ability  to  reason  by  exclusion.  In  the  naive 
physics  stage,  unlike  the  previous  stages,  predictions  that  fail  still  yield  information  about  the 
sit  nation.  For  instance,  if  fluid  is  flowing  into  a  container  and  the  level  is  not  rising,  then  it  is 
reasonable  to  hypothesize  that  fluid  is  flowing  out  of  it  through  some  unknown  path. 

Returning  to  our  balance  beam,  a  process  SWING  might  be  used  to  describe  rotation  around 
a  contact,  point  (See  ligure  7).  'I'he  preconditions  describe  the  geometric  configuration  of  the 
system,  and  the  quantity  condition  says  that  SWING  will  occur  whenever  there  is  a  non  zero 
angular  velocity.  SWING  directly  inlluences  the  angular  position  of  the  beam.  Thus  a  prediction 
concerning  tilt  becomes  a  prediction  about  which  instance,  if  any,  of  the  SWING  process  will  be 

I  2 

active. 


What  inlluences  ANGULAR -VELOC ITY9  The  existence  of  an  ANGULAR-ACCELERATION  process  I  see 
f  igure  X)  that  directly  influences  ANGULAR- VELOC ITY  whenever  there  is  a  net  torque  will  be 
assumed.  It  is  further  assumed  that 

\/M^v) 

IMIYSOHlx)  A  CONTACT  I't  d  \T(rp) 

—  NET  TORQl  Efx.  cp)  -=  SI  M  OK(TORQl'ES-ON(x.  cp)) 

In  <>t|n  r  words,  the  n«*t  torque  on  an  object  around  a  contact  point  is  the  sum  of  the  torque  on 
that  object  measured  about  that  contact,  point.  The  mass  of  the  beam  will  be  ignored,  and  pull 

•Vn.h  h  m*'"  Km  >!  '  h»*  l'rnv»*rsi?v  til  ('hira^o.  wp  nr*3  invesn^ahru;  ’  la’  roll*  >(  :nf  ransKiV  :c_;  :.a 

i'i  a.  a-  raa o  ri  i :  u*-  a  rwi  i  •ih.,r«,n'  «■  dnvrn  |p;irniru; 

\m  o'*  "iT'  »i'*i  ■  *  i  ■  *  t  i  i  v  .\-'<\  ;  ‘ir*1  **ri  t  annn  for  a  ;  '♦  ,  wmiM  mmvp  -amt  f  i  ■ » n  >  ;!nr.  i'  i.  ‘L*  l:  ■ 

o!-'  In  ’Ti'  m  L-v  *?)•*  ft  Li -in*  *■  r  <•  tin  j,i  v»*  rise  ■  u  ail  v  one  inst;ir»«  .  a  '  .  t ::  2  !<•*  •  ra.  ■  :  c  \  • 

*  -  '  t:  ’»•  mi  tn'j'.*  'riT'1  ‘[it*  ijm  l»Orriii  h*-  ’ira  whi-thnr  Uu*  insF.ui''’  •  »f  amih;  '  i«*r.  «■  ml  f  * 
i  n*  ■  -tr/ii-ir  •  ’ ■  1  r  .  1  >  ,  . 
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Figure  7  -  A  SWING  process 

A  SWING  process  describes  rotation  of  an  object  around  another  object.  For  the  balance  beam 
there  will  be  two  instances  of  this  process,  differing  only  in  their  bindings  for  the  direction  dir. 
In  each  instance  b  will  be  bound  to  the  beam,  c  will  be  bound  to  the  fulcrum,  and  cp  will  be 
bound  to  the  contact  point,  between  them. 

It  is  assumed  that  each  physical  object  (PHYSOB)  has  quantities  to  represent  its  angular  po¬ 
sition  and  velocity  with  respect  to  each  point  of  contact  with  other  objects.  Directions  will  be 
noted  by  the  symbols  CW,  CCW,  and  NULL,  corresponding  to  clockwise  rotation,  counterclockwise 
rotation,  and  no  rotation. 

Process  SWING 

Individuals : 

b  a  PHYSOB 
c  a  PHYSOB 
cp  a  CONTACT-POINT 
dir  a  DIRECTION 

Preconditions : 

MOBILE (b) 
not  MOBILE (c) 

CONNECTED (b,  c,  cp) 

ROTATION -FREE (b ,  c,  cp) 

D I RECTI ON -OF (dir ,  ANGULAR-VELOCITY (b ,  cp)  ) 

Quantity  Conditions : 

Am  [ANGULAR-VELOCITY (b,  cp)  ]  >  ZERO 

Influences  : 

I + ( ANGULAR- PQS ITION (b ,  cp)  ,  A [ANGULAR-VELOCITY (b ,  cp)  ] ) 


Figure  8  -  ANGULAR-ACCELERATION  process 
Process  ANGULAR-ACCELERATION 


Individuals : 

b  a  PHYSOB 
c  a  PHYSOB 
cp  a  CONTACT-POINT 
dir  a  DIRECTION 

Preconditions : 

MOBILE (b) 
not  MOBILE(c) 

CONNECTED (b,  c,  cp) 

ROTATION-FREE (b,  c,  cp) 

DIRECTION-OF (dir ,  NET-TORQUE (b ,  cp)  ) 

Quantity  Conditions: 

Am  [NET-TORQUE  (b,  cp)]  >  ZERO 

Relations : 

Let  acc  be  a  quantity 
acc  Xq+  NET-TORQUE  (b,  cp) 
acc  Xq_  MASS  (b) 

Inf luences : 

I + (ANGULAR- VELOCITY (b,  cp)  ,  A  [acc]) 


of  gravity  on  the  blocks  on  each  side  of  the  fulcrum  will  be  assumed  to  be  the  only  source  of 
torques.  Figure  9  describes  this  induced  torque  by  means  of  an  individual  view.  Notice  that  the 
factors  illuminated  in  the  causal  corpus  of  BIGGER  and  FARTHER  have  become  the  quantities 
MASS  and  DISTANCE,  and  their  role  in  the  producing  swinging  has  been  explicated.  In  particular, 
these  properties  determine  how  much  torque  each  block  places  on  the  beam.  The  sum  of  the 
t  orques  determines  the  net  torque,  which  can  cause  the  beam  to  accelerate  and  thus  swing. 

This  model  comes  one  step  closer  to  a  model  that  can  always  determine  which  way 
something  will  tilt.  There  will  still  he  cases  in  which  exactly  what  will  happen  cannot  determined 
(e.g..  if  the  mass  on  one  side  is  increased  and  it  is  brought  closer  to  the  pivot),  but  this  is  a 
precise  hypothesis  about  what  all  the  relevant  factors  are. 

5.3.1.  Learning  Naive  Physics 

The  major  problem  in  learning  a  naive  physics  is  constructing  a  vocabulary  of  processes 
'■hat  consistently  describes  experience.  The  learner  must  strip  away  the  irrelevant  predicates 
that,  are  part  of  the  protohisfories  and  causal  corpus  and  construct  more  appropriate 
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Figure  9  -  A  description  of  gravity-induced  torque 

Positive  torques  are  assigned  to  clockwise  (CW)  and  negative  torques  are  assigned  to 

counter-clockwise  (CCW). 

Individual  View  GRAVITY- INDUCED-TORQUE 

Individuals : 

b  a  PHYSOB 
c  a  PHYSOB 
d  a  PHYSOB 
cp  a  CONTACT-POINT 

Preconditions : 

CONNECTED (b,  c,  cp) 

ON (d ,  b) 


Relations : 

Let  f  be  a  quantity 
f  6  TORQUES-QN (b ,  cp) 
f  Xq+  DISTANCE (C-M(d)  ,  cp) 
f  \.q+  MASS(d) 

ON (C-M(d)  ,  SIDE-OF (CW,  b,  cp))  if f  As  [f ]  =1 
ON  (C-M(d)  ,  SIDE-OF  (CCW ,  b,  cp))  if  f  As  [f]  = -1 
ON  (C-M(d)  ,  SIDE-OF  (NULL ,  b,  cp))  if  f  As  [f  ]  =0 


descriptions.  In  addition,  the  learner  must  sometimes  hypothesize  the  existence  of  objects  and 
properties  that  are  not  directly  observable.  Research  in  machine  learning,  particularly  the  work 
on  explanation-based  learning  (Mitchell,  1986;  DeJong  &  Mooney,  1986),  should  be  useful  here. 
Several  researchers  are  already  beginning  to  directly  address  such  problems  in  modelling  scientific 
discovery  (Langley  1983;  Falkenhainer,  1985,  Rajamoney,  DeJong,  8c  Fallings,  1985). 

The  causal  corpus  provides  a  search  space  for  potential  process  vocabularies.  Ftach 
statement,  in  the  causal  corpus  must  be  elaborated  into  a  consequence  of  a  process  vocabulary.  It 
appears  that  there  is  only  a  small  number  of  distinct  ways  to  perform  the  elaboration,  depending 
on  the  particular  form  of  the  arguments.  For  example,  the  statement 

The  decrease  In  AYOUNT-QF  q  causes  the  LEVEL  of  q  to  fall . 


indicate-  tli., i  -ome  active  process  (or  individual  view)  in  the  situation  contains  the  statement 


AYOUNT-GFCq) 


\  s. 


Hypothesizing  a  process  vocabulary  from  a  causal  corpus  should  be  much  simpler  than 
working  from  protohistories  or  direct  observation.  Yet  it  still  appears  difficult.  We  conjecture 
that  there  are  several  constraints  that  make  the  problem  more  tractable.  First,  people  are 
apparently  conservative  in  the  introduction  of  unobservable  properties.  For  example,  some 
subjects  have  a  model  of  a  domain  that  appears  to  be  organized  around  one  parameter  -  a 
“generalized  strength”  attribute.  In  reasoning  about  fluids,  for  instance,  they  appear  to  use 
pressure,  flow  rate ,  and  velocity  as  different  names  for  the  same  thing.  In  electricity,  they  use 
voltage,  current,  power ,  potential,  and  velocity  of  electrons  interchangeably.  The  advantage  of 
this  generation  strategy  is,  of  course,  that  simpler  models  will  be  explored  first,  with  further 
distinctions  made  only  when  necessary.  Second,  some  physical  laws  are  used  as  constraints  on 
what  process  vocabularies  are  possible.  Conservation  of  energy,  for  example,  demands  that  if  a 
process  directly  influences  a  quantity  representing  some  form  of  energy,  it  must  also  directly 
influence  some  other  quantity  representing  some  form  of  energy,  but  in  the  opposite  direction. 

Once  again,  analogy  can  provide  a  constructive  mechanism.  It  can  be  used  to  import 
candidate  processes  from  previously  understood  domains  -  for  example,  as  when  one  understands 
electricity  in  terms  of  water  flow  (Centner  Sc  Gentner,  1983)  or  evaporation  in  terms  of  an 
implicit  model  of  rocket  ships  escaping  from  earth  (Collins  &  Gentner,  in  press)  This  is  an 
especially  powerful  mechanism  because  if  the  model  for  the  previous  domain  is  consistent  with 
physical  laws,  then  it  suggests  that  the  model  for  the  new  domain  may  be  so  as  well. 

We  can  illustrate  this  with  an  analogy  from  liquid-flow  to  heat-flow.  Recall  the  liquid  flow 
model  presented  in  Section  2.  Figure  10  illustrates  a  collection  of  assertions  which  describe  the 
consequences  of  a  particular  instance  of  LIQUID-FLOW.14  Suppose  a  person  hypothesizes  that 
there  is  a  process  of  heat  flow  which  is  analogous  to  the  process  of  liquid  flow.  By  the  structure- 
mapping  theory,  this  means  that  the  person  suspects  that  a  similar  relational  structure  holds 
among  the  objects  in  the  heat-flow  situation  (the  coffee,  the  ice  cube,  the  silver  bar,  and  the 
instance  of  heat  flow)  as  as  among  the  objects  in  the  liquid-flow  situation  (the  water  in  the 
beaker,  the  water  in  the  vial,  the  pipe,  and  the  instance  of  liquid  flow).  Mapping  the  systematic 
relational  structure  (see  Figure  11),  leads  to  several  predictions  that  the  person  can  check  to  see 
whether  the  analogy  is  correct.  For  example,  it  can  be  determined  whether  or  not  the 
temperature  of  the  ice  cube  is  rising  and  the  temperature  of  the  coffee  falling.  The  structure¬ 
mapping  rules  Tor  analogy  have  provided  an  initial  model  for  the  process  of  heat  flow;  in 
particular,  the  preconditions,  quantity  conditions,  relations,  and  influences  are  all  carried  across 
from  liquid  flow.  Note  that  to  make  the  analogy  really  work,  a  new  kind  of  object  a  HEAT- 
PATH  must  be  postulated.  Thus  analogy  can  provide  candidates  for  extending  ontologies." 

5.4.  Stage  4:  Expert  Models 

The  models  generated  so  far  have  two  important  limitations.  First,  they  still  contain 
fundamental  ambiguities,  ambiguities  which  are  inherent  in  the  nature  of  qualitative 

"  Tlif  i- crUori.i  were  generated  by  an  earlv  ion  of  fIZMO,  a  computer  program  constructed  to  explore  the 
■utipiit;iti'in.ii  ,,f  q|’  theory  ; I ZM1.  wa,  u-  ■  •  i  -d  to  make  predictions  and  interpret  measurements,  not  to  be 

!•  irrnng  .  ■■'■■in  In  i, articular  these  descriptions  were  not  generated  '.vith  learning  or  analogy  m  mind 

'  Of  o'li  1 1 *  !i  "xi-u-i'iii ,  ire  not  to  lie  made  lightly  The  authors  suspect  that  new  types  of  objects  tr"  pustu- 

t'"t  "i  Me-  'urge'  l  iiriaiii  >niv  when  necessary  to  preserve  a  much  larger  systematic  structure 
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Figure  10  —  Relational  structure  for  an  instance  of  liquid  flow 

Depicted  below  are  several  important  conclusions  which  follow  from  the  definition  of  liquid  flow 
presented  in  Figure  2  and  the  assumption  that  an  instance  of  liquid  flow  exists  involving  the 
liquids  in  the  two  containers.  Specifically,  they  describe  the  conditions  for  and  consequences  of 
the  process  instance  pi-0  being  active. 
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Figure  11  —  Relational  structure  transferred  to  heat  flow 

Here  the  relational  structure  describing  a  situation  involving  liquid  flow  has  been  transferred  to  a 
situation  involving  heat  flow.  Notice  the  systematicity  of  the  relational  structure,  as  indicated 
by  the  nested  chains  of  implications. 
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representations.15  Second,  they  lack  domain-independent  generalizations  (except  in  the  raw  form 
of  the  representation  -  CAUSE  statements,  processes,  and  so  on).  The  final  stage  of  learning 
consists  of  overcoming  these  limitations,  of  discovering  ways  to  resolve  ambiguities  and  to 
construct  powerful  generalizations. 

Clearly  several  kinds  of  knowledge  are  involved,  and  the  potential  complexity  of  the  models 
in  this  stage  is  open-ended  (it  includes  the  whole  of  mathematical  physics,  for  example). 
Examples  of  the  kinds  of  knowledge  involved  include  equations  to  describe  relationships  between 
parameters,  “rules  of  thumb”  to  specify  useful  default  resolutions  for  ambiguities,  and  new 
ontologies  to  allow  reasoning  about  more  complex  systems.  The  importance  of  mathematical 
models  is  fairly  obvious.  The  rules  of  thumb  are  less  obvious  but  equally  important  (see  e.g., 
I.enat.  1982).  In  physical  domains  they  include  empirical  knowledge  about  the  circumstances 
under  which  certain  processes  can  be  ignored  (such  as  evaporation  when  water  is  poured  from  one 
glass  to  another)  and  what  their  net  effect  is  (such  as  Black’s  law  for  the  temperature  of 
mixtures).  Finally,  different  ontologies  are  sometimes  necessary  to  deal  with  certain  types  of 
complex  systems.  In  the  process-oriented  physics  discussed  here,  describing  flow  requires  finding 
flow  paths.  Finding  flow  paths  in  complex  networks  such  as  electrical  circuits  can  quickly 
become  computationally  intractable;  switching  to  a  device-centered  physics  (such  as  that 
described  in  deKleer  &  Brown,  1983)  can  reduce  the  computational  burden  to  manageable 
proportions  for  such  systems. 

To  complete  the  balance  beam  example,  we  know  that  the  force  of  a  block  on  the  beam  is 
qualitatively  proportional  to  the  mass  of  the  block  and  to  the  distance  from  the  fulcrum.  If  we 
also  know  that  the  torque  is  the  product  of  distance  and  weight,  then  providing  numerical  values 
for  these  quantities  will  allow  an  unambiguous  prediction  about  tilt. 

5.4.1.  Learning  Expert  Models 

The  transition  to  expert  models  involves  several  kinds  of  learning.  Some  aspects  of  this 
transition  probably  lie  outside  the  scope  of  experiential  learning;  for  example,  people  typically 
learn  mathematical  models  by  being  taught  rather  than  by  discovery.  Some  aspects  of  this 
learning  -  such  as  developing  new  ontologies  -  involve  improving  the  content  of  the 
representations.  Other  aspects  of  the  transition  from  a  naive  physics  to  an  expert  physics  are 
better  described  as  translating  the  existing  qualitative  representations  into  quantitative 
statements,  using  mathematics  to  express  laws.  By  converting  a  physical  theory  into  a 
mathematical  model,  the  learner  gains  'lie  ability  to  make  precise  predictions  and  to  recognize 
powerful  generalizations  more  eas'd  .  vn  important  part  of  this  refinement  is  to  elaborate  q  + 
statements  into  constraint  equation...  Langley  (1979;  Langley,  Zytkow,  Simon  A:  Bradshaw, 
1983)  and  Falkenhainer  (1985)  describe  techniques  that  should  be  useful  for  converting 
qualitative  laws  into  mathematical  relations. 

Developing  rules  of  thumb  means  knowing  not  just  what  is  possible,  but  what  is  probable. 
The  learner  must  discover  which  outcomes  raised  bv  qualitative  reasoning  are  likel\  or  unlikelv 
and  which  potential  interactions  can  be  ignored.  The  techniques  developed  in  machine  learning 
for  acquiring  heuristics  should  be  directly  applicable  (c.f.  Lenat,  1982:  Mitchell.  I9S|).  In 
addition,  the  authors  suspect  the  possible  behaviors  raised  by  naive  physics  are  compared  against 
known  protolnstories.  Hypothesized  outcomes  that  have  no  corresponding  protohistory  are 
judged  unlikely,  and  those  corresponding  to  a  highly  familiar  and  accessible  protohistory  are 
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judged  very  likely  (see  Tversky  &  Kahneman,  1973). 

Further,  it  seems  likely  that  at  least  some  expert  rules  of  thumb  derive  from  learning  new 
and  better  protohistories.  This  intuition  is  based  in  part  on  research  in  automaticity  (Schneider 
Sc  Fisk,  1983).  It  has  been  d<'m..|.st rated  that,  given  an  orderly  domain  and  sufficient  practice, 
adult  subjects  can  learn  a  m  .  rsponse  pattern  well  enough  so  that  it  becomes  essentially 
effortless  (see  also  Anderson,  Rumelhart  8c  Norman,  1978).  Moreover,  there  is  some 

transfer  from  this  over-learned  material  to  new  similar  material.  These  learned  sequences  have 
many  of  the  essential  qualities  of  protohistories.  First,  they  are  triggered  by  recognition  (in  the 
terms  used  here,  by  a  literal  similarity  match  between  the  present  situation  and  a  stored 
situation).  Second,  computing  and  carrying  out  the  procedures  that  follow  from  the  match  is 
automatic;  virtually  no  atteutional  resources  are  required.  Third,  these  computations  are 
implicit;  subjects  are  typically  poor  at  introspecting  about  what  they  are  doing,  and  when  they 
do  introspect,  it  can  interfere  with  the  response  (Brooks,  1978;  Reber,  1967,  1976).  It  may  be  too 
simplistic  to  view  protohistories  as  a  special  case  of  automatic  pattern-response  combination. 
Nevertheless,  there  is  enough  overlap  in  the  phenomena  to  allow  some  confidence  that 
protohistories  can  continue  to  be  learned  at  all  stages  of  expertise.  Of  course,  the  content.-,  of 
expert  protohistories  may  be  different  from  those  of  novices,  since  experts'  protohistories  may 
reflect  a  more  advanced  ontology,  as  discussed  below.  However,  the  mechanism  of  a 
percept  ually-triggered  automatic  match  should  be  the  same. 

We  suspect  that  ontological  shift  is  driven  both  by  the  desire  to  understand  more  complex 
physical  systems  and  by  the  emergence  of  domain-independent  mathematical  abstractions.  As 
an  example  of  the  first  kind,  consider  the  problem  of  reasoning  about  fluid  flow  in  a  complex 
system,  such  as  a  steam  plant.  Hayes  (1979b)  has  distinguished  two  separate  ontologies  lor 
liquids:  a  contained-liquid  ontology,  in  which  liquid  is  thought  of  as  the  fluid  in  a  place  and  a 
molecular  collection  ontology,  in  which  water  is  thought  of  as  little  bits  of  fluid  that  move  around 
inside  the  system.  The  contained-liquid  ontology  is  appropriate  if  the  goal  is  to  determine  what 
flows  can  occur.  However,  it  will  not  help  us  determine  how  changes  in  th  •  properties  of  the 
working  fluid  in  one  part  of  the  system  (say,  the  rising  temperature  of  the  inlet  water  in  a  boiler) 
can  affect  properties  of  the  fluid  in  another  part  of  the  system  (say,  temperature  of  the  steam 
coming  out  of  the  boiler’s  superheater).  In  this  case,  liquid  must  be  viewed  in  terms  of  molecular 
collections  that  move  around  inside  the  system.  Conversely,  establishing  flows  using  the 
molecular  collection  view  is  very  difficult.  A  learner  with  only  one  of  these  two  ontologies  will 
have  a  difficult  time  with  certain  questions,  and  such  difficulties  may  drive  the  search  for  a  new 
ontology. 

Mathematical  abstractions  provide  another  important  driving  force  in  ontological  change. 
Iri  system  dynamics,  for  example,  physical  systems  involving  fluid  elements,  mechanical  elements, 
thermal  elements,  and  acoustical  elements  are  viewed  as  variations  on  a  common,  abstract 
theme.  This  means  that  the  analysis  and  synthesis  tools  developed  for  abstract  mat  fiemai  i<  al 
models  can  be  used  to  solve  problems  in  several  domains.  This  is  a  powerful  motivation,  as 
evidenced  by  the  wave  of  interest  in  attempting  diverse  applications  evoked  by  the  publication' 
of  certain  new  mathematical  formalisms  (e.g..  catastrophe  theorv  and  fractal  geometrvi. 
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domains.  The  learning  sequence  is  built  around  three  ideas,  h  irst,  development  proceeds  from 
rich,  to  sparse,  and  from  concrete  to  abstract  that  is,  initial  representations  differ  from  later 
representations  m  containing  more  information,  and  in  particular,  more  context-specific 
inlormat ion.  Sciuml.  alter  sutlicient  experience,  people  develop  experiential  models  that  are 
centered  around  the  notion  of  physical  process,  as  described  by  Qualitative  Process  theory. 
I  bird,  implicit  processes  of  comparing  and  mapping  between  stored  knowledge  and  a  current 
situation,  as  described  m  structure  mapping  theory,  are  central  to  experiential  learning. 

hour  stages  of  experiential  learning  have  been  laid  out:  protohistories,  the  causal  corpus, 
naive  physics  and  expert  models. IS  The  first  stage,  that  of  protohistories,  embodies  the  idea  that 
early  representations  are  rich  and  context  specific;  this  stage  attempts  to  capture  a  combination 
ol  evidence  from  developmental  patterns,  similarity  judgments,  basic  level  categories  and  object 
prototypes.  1'he  third  stage  is  the  process -centered  stage  described  by  Quantitative  Process 
theory.  The  fourth  -'age  builds  on  the  third  stage  models,  adding  domain -independent 
generalizations  and  m  some  cases  mathematical  models  There  is  some  evidence  for  the  third  and 
possibly  the  fourth  stages  m  t  lie  research  on  expertise  under  the  rubric  of  the  novice  expert  shift 
I  (.'hi.  f  ell  ov  ic  h  and  tilaser,  1  OS  1 ;  Larkin,  1983). 

the  second  stage,  the  causal  corpus,  is  the  most  speculative.  There  is  rio  direct  evidence  for 
its  existence,  nor  do  we  currently  have  a  detailed  theory  of  the  kinds  of  causal  statements  that 
can  enter  into  the  representations.  Moreover,  detailing  how  the  causal  corpus  emerges  from 
protohistories  will  not  be  easy.  Out  something  like  the  causal  corpus  seems  necessary:  a 
collection  of  simplistic,  mostly  binary  directed  regularities  among  dimensions  and  quantities  that 
begin  to  be  differentiated  out  of  the  tangled  representations  of  the  protohistory  stage.  The 
learner  can  use  these  simple  assertions  as  grist  for  further  progress. 

What  happens  to  prior  stages  as  new  stages  occur’  f  irst ,  stored  representations  have  to  be 
distinguish  from  new  learning.  We  conjecture  that  learners  retain  much  of  their  stored 
knowledge  even  Wuen  they  go  beyond  the  stage  at  which  it  was  formed.  Thus,  a  hydraulic.- 
engineer  still  uses  the  same  protohistory  he  or  she  formed  as  a  child  to  decide  how  fa-t  to  <  arr\  a 
glass  of  water  without  spilling  it.  And,  as  de  Klcer  (19791  points  out,  expert  physici-t.-  ami 
engineers  do  not  always  resort  to  quantitative  models  (fourth  stage);  frequently  the  answer  i  hev 
want  can  be  obtained  by  using  a  good  qualitative  model  (third  -tage). 

But  what  about  new  learning’  Does  new  learning  occur  only  at  the  leading  edge,  or  do 
people  continue  to  learri  at  levels  below  the  most  advanced  stage  they  have  attained’  We  ■  us|.,.,  i 
that  even  experts  continue  to  learn  at  all  prior  stages,  with  the  possible  exception  of  Mu  .  .oi-.,, 
■orpus.  As  described  earlier,  there  is  evidence  that  experts  continue  to  lav  down 
irotohi-tories.  8jmdarly,  learners  who  are  operating  at  the  fourth  -tage,  that  ol  expeii  mi-mi-' 
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increases  the  least  new  learning  is  expected  within  the  causal  corpus. 

Of  the  four  levels,  the  causal  corpus  has  the  least  ciaim  to  continued  independent  existence 
in  an  advanced  expert.  The  causal  corpus  is  not  reliable  for  prediction,  nor  does  it  possess  the 
advantages  of  ..utoinaticity.18  In  summary,  the  overall  picture  is  that  a  learner  moves  from  rich, 
perceptually  specific  protohistories  to  the  sparser  representation  of  the  causal  corpus.  The 
causal  corpus  serves  as  a  staging  area  in  which  rough  connections  among  variables  can  be  stored 
until  they  can  be  subsumed  into  a  true  system.  If  learning  continues,  a  person  develops  a 
process-centered  naive  physics,  and,  for  some  domains,  expert  models. 
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